A Framework for Analyzing Linux System Overheads on HPC Applications∗
نویسندگان
چکیده
Linux currently plays an important role in high-end computing systems, but recent work has shown that Linux-related processing costs and variablity in network processing times can limit the scalability of HPC applications. Measuring and understanding these overheads is thus key for future use of Linux in large scale HPC systems. Unfortunately, currently available performance monitoring systems introduce large overheads, performance data is generally not available on-line or from the operating system, and the data collected by such systems is generally coarse-grained. In this paper, we present a low-overhead framework for solving one of these problems: making useful operating system performance data available to the application at runtime. Specifically, we have enhanced Linux Trace Toolkit(LTT) to monitor the performance characteristics of individual system calls and to make per-request performance data available to the application. We demonstrate the ability of this framework to monitor individual network and disk requests, and show that the overhead of our per-request performance monitoring framework is minimal. We also present preliminary measurements of Linux system call overhead on a simple HPC.
منابع مشابه
A Performance Comparison Using Hpc Benchmarks: Windows Hpc Server 2008 and Red Hat Enterprise Linux 5
A collection of performance benchmarks have been run on an IBM System X iDataPlex cluster using two different operating systems. Windows HPC Server 2008 (WinHPC) and Red Hat Enterprise Linux v5.4 (RHEL5) are compared using SPEC MPI2007 v1.1, the High Performance Computing Challenge (HPCC) and National Science Foundation (NSF) acceptance test benchmark suites. Overall, we find the performance of...
متن کاملManagement of Virtual Large-scale High-performance Computing Systems
Linux is widely used on high-performance computing (HPC) systems, from commodity clusters to Cray supercomputers (which run the Cray Linux Environment). These platforms primarily differ in their system configuration: some only use SSH to access compute nodes, whereas others employ full resource management systems (e.g., Torque and ALPS on Cray XT systems). Furthermore, the latest improvements i...
متن کاملTowards a Comprehensive Framework for Telemetry Data in HPC Environments
A large number of 2nd generation high-performance computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale on today’s HPC infrastructures. They require information about applications and their environment to steer and optimize execution. We define this information as telemetry data. Current HPC platform...
متن کاملParavirtualization for HPC Systems
In this work, we investigate the efficacy of using paravirtualizing software for performance-critical HPC kernels and applications. We present a comprehensive performance evaluation of Xen, a low-overhead, Linux-based, virtual machine monitor, for paravirtualization of HPC cluster systems at LLNL. We investigate subsystem and overall performance using a wide range of benchmarks and applications...
متن کاملInSight: A Framework for Application Diagnosis using Virtual Machine Record and Replay
Non-deterministic execution poses several challenges toward diagnosis—debugging, profiling and execution state mining, of software systems (user-level applications and operating systems). While several techniques using modified libraries, library wrappers, binary instrumentation and memory shadowing techniques exist, we aim to exploit the record and replay technique enabled by virtualization to...
متن کامل